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A stochastic analysis of financial data is presented. In particular we investigate how the statistics 
of log returns change with different time delays r. The scale dependent behaviour of financial data 
can be divided into two regions. The first time-range, the small-timescale region (in the range 
of seconds) seems to be characterized by universal features. The second time-range, the medium- 
timescale range from several minutes upwards and can be characterized by a cascade process, which 
is given by a stochastic Markov process in the scale r. A corresponding Fokker-Planck equation can 
be extracted from given data and provides a non equilibrium thermodynamical description of the 
complexity of financial data. 

PACS numbers: 02.50.Ga, 05.45.Tp 



INTRODUCTION 



II. 



SMALL SCALE ANALYSIS 



One of the remarkable features of the complexity of the 
financial market is that very often financial quantities 
display non-Gaussian statistics often denoted as heavy 
tailed or intermittent statistics, for further details see 

ELHBBiiHIi®- 

To characterize the fluctuations of a financial time se- 
ries x(t), most commonly quantities like returns, log- 
returns or price increments are used. Here, we con- 
sider the statistics of the log return y(r) over a certain 
timescale r, which is defined as: 



y(r) = logx(t + T)-logx(t). 



(1) 



where x{t) denotes the price of the asset at time t. We 
suppressed the dependence of the log return y(r) on the 
time t, since we assume the underlying stochastic process 
to be stationary. In this paper we present mainly results 
for Bayer for the time span of 1993 to 2003. The financial 
data sets were provided by the Karlsruher Kapitalmarkt 
Datenbank (KKMDB) 10]. The graph of the logarithm 
of the price time series is shown in Fig. ^ 
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FIG. 1: Log price for Bayer for the years 1993-2003 



First we look at the statistics of p(y{r)) as shown in 
Fig. [21 Here we find the remarkable feature of financial 
data that the probability density functions (pdfs) are not 
Gaussian, but exhibit heavy tailed shapes. Another re- 
markable feature is the change of the shape with the size 
of the scale variable r. To analyse the changing statistics 
of the pdfs with the scale r a non-parametric approach 
is chosen. The distance between the pdf p{y{r)) on a 
timescale r and a pdf px{y(T)) on a reference timescale 
T is computed. As a reference timescale, T = lsec is 
chosen. In order to look only at the shape of the pdfs 
and to exclude effects due to varying mean and variance, 
all pdfs p{y{r)) have been normalized to a zero mean and 
a standard deviation of 1. 

As a measure to quantify the distance between two 
distributions p(y(r)) and pr (y(T)), the Kullback-Leibler- 
Entropy [HI 
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is used. In Fig. |3| the evolution of dx with increasing 
r is shown, which measures the change of the shape of 
the pdfs. For different stocks we found that for timescales 
smaller than about one minute a linear growth of the dis- 
tance measure seems to be universally present, see Fig. 

If as a reference distribution a normalised Gaussian 
distribution is taken, the fast deviation from the Gaus- 
sian shape in the small timescale regime becomes evident, 
as displayed in Fig. The independence of this small 
scale behaviour on the particular choice of the measure 
and on the choice of the stock is shown in [l^] ■ 



III. 



MEDIUM SCALE ANALYSIS 



Next the behaviour for larger timescales (r > 1mm) 
is discussed. Here we proceed the analysis with the idea 
of a cascade. As has been shown by 0, 0, [w| it is 
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FIG. 2: Unconditional probability densities p(y(r)) for the 
timescales of r = 240s, 454s, 955s, 1800s and 3766s (bottom 
up) obtained from the original data (dots) and reconstructed 
from the extracted Fokker-Planck equation (dashed lines). 



possible to grasp the complexity of financial data by cas- 
cade processes running in the variable r. In particular 
it has been shown that it is possible to estimate directly 
from given data a stochastic cascade process in form of 
a Fokker-Planck equation |13l fl4| . The underlying idea 
of this approach is to access statistics of all orders of 
the financial data by the general joint n-scale probabil- 
ity densities p(yi, n; y 2 , t 2 ; Un, Tat) (Here we use the 
shorthand notation y± = y(j~i) and take without loss of 
generality r, < r,+i. The smaller log returns yin) are 
nested inside the larger log returns y(Ti+i) with common 
end point t.) 

The joint pdfs can be expressed as well by 
the multiple conditional probability densities 
P(yi,n\yi+i,n+i;— ;Vn,tn)- This very general n- 
scale characterization of a data set, which contains the 
general n-point statistics, can be simplified essentially 
if there is a stochastic process in r, which is a Markov 
process. This is the case if the conditional probability 
densities fulfil the following relations: 

p{yi,T 1 \y 2 ,T 2 ;y 3 ,T 3 ;...y N ,T N ) = p(y 1} n\y 2 , r 2 ). (3) 



Consequently, 

p(yi,n;...;y N ,T N ) = (4) 
p{yi,T 1 \y 2 ,T 2 ) ■ ... ■ p(yN-i,TN-i\yN,r N ) -p{yN,T N ) 

holds. 

Equation (0J indicates the importance of the con- 
ditional pdf for Markov processes. Knowledge of 
P{Vi T |j/0; tq) (for arbitrary scales r and tq with r < 
To) is sufficient to generate the entire statistics of the 
increment, encoded in the N-point probability density 
p(yi,Ti;y 2 ,r 2 ; . . .;y N ,T N ). 

For Markov processes the conditional probability den- 
sity satisfies a master equation, which can be put into 
the form of a Kramers-Moyal expansion for which the 
Kramers-Moyal coefficients D^ k '{y,r) are defined as the 
limit At — » of the conditional moments (y, t, At): 

D (fc) (y,T)= lim M^(y,T,Ar) (5) 



M^(y,T,Ar)= (6) 

+ OC 

J (y - y) k p(v, t ~ At \v, t ) d y- 

—00 

For a general stochastic process, all Kramers-Moyal co- 
efficients are different from zero. According to Pawula's 
theorem, however, the Kramers-Moyal expansion stops 
after the second term, provided that the fourth order co- 
efficient D^\y,r) vanishes. In that case, the Kramers- 
Moyal expansion reduces to a Fokker-Planck equation 
(also known as the backwards or second Kolmogorov 
equation) : 

d 

-T—p(y,T\y ,T )= (7) 

j-j^«(j,,T) + ^DW(y,T)}p(y,T\y ,T ). 

is denoted as drift term, as diffusion term. 

The probability density p(y, t) has to satisfy the same 
equation, as can be shown by a simple integration of Eq. 

©■ 



IV. RESULTS FOR BAYER 

From the data shown in Fig. the Kramers-Moyal 
coefficients were calculated according to Eqs. © and 
|J5J). Hereby we divided the timescale into intervals 

-(Ti-i +n), -{n +n+i) 
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FIG. 3: Distance measure dx for a reference distribution pr{y) for Bayer, a) As reference timescale T = lsec is chosen. The 
bold dots represent the estimated value, the dotted lines the one-sigma error bound and the solid line the linear fit for the first 
region, after 0. b) As a reference distribution J3t(j/) a normalised Gaussian distribution is chosen. 



assuming that the Kramers-Moyal coefficients are con- 
stant with respect to the timescale r in each of these 
sub intervals of the timescale. We started with a small- 
est timescale of 240s and continued in such a way that 
Tj = 0.9 ■ Ti+x. The Kramers-Moyal coefficients them- 
selves were parameterised in the following form: 

D {1) = a +a iy (8) 
= ft + fty + ftj/ 3 . ( 9 ) 

The coefficients we obtained by this procedure are shown 
in Fig. Q] This result shows that the rich and com- 
plex structure of financial data, expressed by multiscale 
statistics, can be pinned down to coefficients with a quite 
simple functional form. 

To show the quality of our results we reconstruct 
the measured statistics by the estimated Fokker-Planck 
equations. At first, the conditional probability densi- 
ties p(y(Ti)\y(Ti + i)) were reconstructed. As an example 
the conditional probability density p{y{r = 3389s) \y(r = 
3766s)) is shown in Fig. [S] The reconstructed condi- 
tional probability density and the one calculated directly 
from the data are in good agreement. As a next step 
we used the pdf on the scale of r = 27900s and the 
reconstructed conditional probability densities to calcu- 
late the increment pdfs on timescales between four min- 
utes and one hour. The results for the timescales of 
r = 3766s, 1800s, 955s, 454s and 240s are shown in Fig. 
13 Again the agreement between unconditional probabil- 
ity densities p{y{r)) of the original data (dots) and the 
reconstructed ones (broken lines) is very good. 

V. DISCUSSION 

The results indicate that for financial data there 
are two scale regimes. In the small scale regime the 
shape of the pdfs change very fast and a measure 
like the Kullback-Leibler entropy increases linearly. At 
timescales of a few seconds not all available information 
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FIG. 4: The parameters ao,«i,/3o,/3i an d P2 of the param- 
eterisation of the Kramers-Moyal coefficients used for the re- 
construction. 



may be included in the price and processes necessary 
for price formation take place. Nevertheless this regime 
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FIG. 5: Conditional probability density p(y(r — 
3389s) \y(r = 3766s)) of given data (unbroken lines) and re- 
constructed by the numerical solution of the Fokker-Planck 
equation (broken lines). 

seems to exhibit a well defined structure, expressed by 
the very simple functional form of the Kullback-Leibler 
entropy with respect to the timescale r. 

Based on a stochastic analysis we have shown that a 
second time range, the medium scale range exists, where 
multiscalc joint probability densities can be expressed by 
a stochastic cascade process. Here the information on 
the comprehensive multiscale statistics can be expressed 
by simple conditioned probability densities. This simpli- 



fication may be seen in analogy to the thermodynamical 
description of a gas by means of statistical mechanics. 
The comprehensive statistical quantity for the gas is the 
joint n-particle probability density, describing the loca- 
tion and the momentum of all the individual particles. 
One essential simplification for the kinetic gas theory is 
the single particle approximation. The Boltzmann equa- 
tion is an equation for the time evolution of the proba- 
bility density p(p, i) in one-particle phase space, where 
x and p are position and momentum, respectively. In 
analogy to this we have obtained for the financial data a 
Fokker-Planck equation for the scale r evolution of con- 
ditional probabilities, p(j/j,Tj|j/j-|_i,Ti-)-i). In our cascade 
picture the conditional probabilities can not be reduced 
further to single probability densities, p(yi,Ti), without 
loss of information, as it is done for the kinetic gas theory. 

As a last point we want to mention that based on the 
information of the Fokker-Planck equation it is possible 
to generate artificial data sets. As pointed out in [l5| . 
the knowledge of conditional probabilities can be used to 
generate time series. One important point is that one 
uses increments y(r) with common right endpoints. By 
the knowledge of the n-scale conditional probability den- 
sity of all 2/(tj) the stochastically correct next point can 
be selected. We could show that time series for turbulent 
data generated by this procedure even reproduces quite 
well the conditional probability densities, as the central 
quantity for a comprehensive multiscale characterization. 
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